Chapter 1: Introduction

New York city Is one of the most famous cities in the United States as it is financial, cultural and entertainment centers of the world. Of course, all of these factors are an attraction for tourists around the world to travel to New York. According to Mastercard Global Destination Index report, New York City is one of the 10 Most Visited Cities in 2019. There were 13.6 million foreign tourists that traveled to New York last year. In addition, they visited New York on an average of approximately 8 nights and spent an average of 152 dollars per day.

Airbnb is a way of international visitors to find accommodations for traveling abroad. In 2018, Airbnb has approximately 150 million guest users and 2.9 million hosts around the world, covering in over 191 countries. According to Wachsmuth et al. (2018), New York is the third-largest Airbnb market in the world, with more than 40,000 housing and apartment rental listings, covering in Manhattan, Bronx, Queens, Brooklyn, and Staten Island. Furthermore, they found that New York’s Airbnb revenue increased by 14 percent or jumped to $657 million between 2016 and 2017, in line with an increase of the number of Airbnb guests and New York’s visitors. Taking into an account of all factors, it is undeniable that Airbnb has become a way to find the accommodations in New York City for many tourists around the world. Thus, Airbnb hosts data becomes a hot topic to do the research.

This report, focuses on Airbnb hosts in New York City 2019, contains 6 chapters: chapter 2 includes source data and basic data visualization; …………………

Chapter 2: Description of Data

2.1 Source Data

In this study, the CSV data file comes from Inside Airbnb (http://insideairbnb.com/get-the-data.html). This dataset contains 48,865 Airbnb rental providers information in New York City 2019 with 16 variables,including housing ID, accommodation name, host names, host ID, neighborhood group, sub-neighborhoods, latitude, longitude, room type, price, minimum night for rent, the number of reviews, the date of last review, the frequency of reviews, calculated host listing counts, and the number of availabilities in one year. To review the data, str() function in R is used to view a quick snapshot of the data as follows:

## 'data.frame':    48895 obs. of  16 variables:
##  $ id                            : int  2539 2595 3647 3831 5022 5099 5121 5178 5203 5238 ...
##  $ name                          : Factor w/ 47897 levels ""," 1 Bed Apt in Utopic Williamsburg ",..: 12564 38007 45009 15582 19210 24840 8248 24887 15477 17564 ...
##  $ host_id                       : int  2787 2845 4632 4869 7192 7322 7356 8967 7490 7549 ...
##  $ host_name                     : Factor w/ 11453 levels "","​ Valéria",..: 4997 4791 2913 6210 5929 1938 3549 9649 6880 1235 ...
##  $ neighbourhood_group           : Factor w/ 5 levels "Bronx","Brooklyn",..: 2 3 3 2 3 3 2 3 3 3 ...
##  $ neighbourhood                 : Factor w/ 221 levels "Allerton","Arden Heights",..: 109 128 95 42 62 138 14 96 203 36 ...
##  $ latitude                      : num  40.6 40.8 40.8 40.7 40.8 ...
##  $ longitude                     : num  -74 -74 -73.9 -74 -73.9 ...
##  $ room_type                     : Factor w/ 3 levels "Entire home/apt",..: 2 1 2 1 1 1 2 2 2 1 ...
##  $ price                         : int  149 225 150 89 80 200 60 79 79 150 ...
##  $ minimum_nights                : int  1 1 3 1 10 3 45 2 2 1 ...
##  $ number_of_reviews             : int  9 45 0 270 9 74 49 430 118 160 ...
##  $ last_review                   : Factor w/ 1765 levels "","1/1/13","1/1/15",..: 203 1059 1 1438 348 1234 277 1244 1383 1317 ...
##  $ reviews_per_month             : num  0.21 0.38 NA 4.64 0.1 0.59 0.4 3.47 0.99 1.33 ...
##  $ calculated_host_listings_count: int  6 2 1 1 1 1 1 1 1 4 ...
##  $ availability_365              : int  365 355 365 194 0 129 0 220 0 188 ...

2.2 Geographic Coverage of Data

To do a basic data visualization, latitude and longitude in the dataset are used to draw the map, which is plotted by leaflet library in R.

(1) New York’s Airbnb with four different price groups.

To begin with first data visualization reviews, Airbnb hosts are divided by four different price groups to plot the map. Focusing on the below map, the number of Airbnb hosts that are lower than $150 per night (yellow points) are 33,957, accounting for 69.4% of all Airbnb hosts in New York City. In addition, there are 13.894 hosts or 28.4% that are in $151-$500 per night groups (blue points). Furthermore, 805 and 239 hosts are in $501-$1,000 (green points) and higher $1,000 (red points) groups, respectively.

(2) New York’s Airbnb with three different room types.

Airbnb hosts can list entire homes/apartments, private or shared rooms. According to the below map, there are 25,409 hosts for entire homes/apartments (blue points), accounting for approximately 51.9% of all Airbnb hosts in New York City. While, there are 22,326 private rooms (yellow points) and only 1,160 shared rooms (red points).

Chapter 3: Descriptive Statistics

## 'data.frame':    48895 obs. of  16 variables:
##  $ id                            : int  2539 2595 3647 3831 5022 5099 5121 5178 5203 5238 ...
##  $ name                          : Factor w/ 47897 levels ""," 1 Bed Apt in Utopic Williamsburg ",..: 12564 38007 45009 15582 19210 24840 8248 24887 15477 17564 ...
##  $ host_id                       : int  2787 2845 4632 4869 7192 7322 7356 8967 7490 7549 ...
##  $ host_name                     : Factor w/ 11453 levels "","​ Valéria",..: 4997 4791 2913 6210 5929 1938 3549 9649 6880 1235 ...
##  $ neighbourhood_group           : Factor w/ 5 levels "Bronx","Brooklyn",..: 2 3 3 2 3 3 2 3 3 3 ...
##  $ neighbourhood                 : Factor w/ 221 levels "Allerton","Arden Heights",..: 109 128 95 42 62 138 14 96 203 36 ...
##  $ latitude                      : num  40.6 40.8 40.8 40.7 40.8 ...
##  $ longitude                     : num  -74 -74 -73.9 -74 -73.9 ...
##  $ room_type                     : Factor w/ 3 levels "Entire home/apt",..: 2 1 2 1 1 1 2 2 2 1 ...
##  $ price                         : int  149 225 150 89 80 200 60 79 79 150 ...
##  $ minimum_nights                : int  1 1 3 1 10 3 45 2 2 1 ...
##  $ number_of_reviews             : int  9 45 0 270 9 74 49 430 118 160 ...
##  $ last_review                   : Factor w/ 1765 levels "","1/1/13","1/1/15",..: 203 1059 1 1438 348 1234 277 1244 1383 1317 ...
##  $ reviews_per_month             : num  0.21 0.38 NA 4.64 0.1 0.59 0.4 3.47 0.99 1.33 ...
##  $ calculated_host_listings_count: int  6 2 1 1 1 1 1 1 1 4 ...
##  $ availability_365              : int  365 355 365 194 0 129 0 220 0 188 ...
##  [1] "id"                             "name"                          
##  [3] "host_id"                        "host_name"                     
##  [5] "neighbourhood_group"            "neighbourhood"                 
##  [7] "latitude"                       "longitude"                     
##  [9] "room_type"                      "price"                         
## [11] "minimum_nights"                 "number_of_reviews"             
## [13] "last_review"                    "reviews_per_month"             
## [15] "calculated_host_listings_count" "availability_365"
##        id                                         name      
##  Min.   :    2539   Hillside Hotel                  :   18  
##  1st Qu.: 9471945   Home away from home             :   17  
##  Median :19677284                                   :   16  
##  Mean   :19017143   New york Multi-unit building    :   16  
##  3rd Qu.:29152178   Brooklyn Apartment              :   12  
##  Max.   :36487245   Loft Suite @ The Box House Hotel:   11  
##                     (Other)                         :48805  
##     host_id                 host_name        neighbourhood_group
##  Min.   :     2438   Michael     :  417   Bronx        : 1091   
##  1st Qu.:  7822033   David       :  403   Brooklyn     :20104   
##  Median : 30793816   Sonder (NYC):  327   Manhattan    :21661   
##  Mean   : 67620011   John        :  294   Queens       : 5666   
##  3rd Qu.:107434423   Alex        :  279   Staten Island:  373   
##  Max.   :274321313   Blueground  :  232                         
##                      (Other)     :46943                         
##             neighbourhood      latitude       longitude     
##  Williamsburg      : 3920   Min.   :40.50   Min.   :-74.24  
##  Bedford-Stuyvesant: 3714   1st Qu.:40.69   1st Qu.:-73.98  
##  Harlem            : 2658   Median :40.72   Median :-73.96  
##  Bushwick          : 2465   Mean   :40.73   Mean   :-73.95  
##  Upper West Side   : 1971   3rd Qu.:40.76   3rd Qu.:-73.94  
##  Hell's Kitchen    : 1958   Max.   :40.91   Max.   :-73.71  
##  (Other)           :32209                                   
##            room_type         price         minimum_nights   
##  Entire home/apt:25409   Min.   :    0.0   Min.   :   1.00  
##  Private room   :22326   1st Qu.:   69.0   1st Qu.:   1.00  
##  Shared room    : 1160   Median :  106.0   Median :   3.00  
##                          Mean   :  152.7   Mean   :   7.03  
##                          3rd Qu.:  175.0   3rd Qu.:   5.00  
##                          Max.   :10000.0   Max.   :1250.00  
##                                                             
##  number_of_reviews  last_review    reviews_per_month
##  Min.   :  0.00           :10052   Min.   : 0.010   
##  1st Qu.:  1.00    6/23/19: 1413   1st Qu.: 0.190   
##  Median :  5.00    7/1/19 : 1359   Median : 0.720   
##  Mean   : 23.27    6/30/19: 1341   Mean   : 1.373   
##  3rd Qu.: 24.00    6/24/19:  875   3rd Qu.: 2.020   
##  Max.   :629.00    7/7/19 :  718   Max.   :58.500   
##                    (Other):33137   NA's   :10052    
##  calculated_host_listings_count availability_365
##  Min.   :  1.000                Min.   :  0.0   
##  1st Qu.:  1.000                1st Qu.:  0.0   
##  Median :  1.000                Median : 45.0   
##  Mean   :  7.144                Mean   :112.8   
##  3rd Qu.:  2.000                3rd Qu.:227.0   
##  Max.   :327.000                Max.   :365.0   
## 

Chapter 4: Linear Regression

##        id                                         name      
##  Min.   :    2539   Home away from home             :   12  
##  1st Qu.: 9045427   Loft Suite @ The Box House Hotel:   11  
##  Median :19175650   #NAME?                          :   10  
##  Mean   :18313347   Brooklyn Apartment              :    9  
##  3rd Qu.:27703016   Private Room                    :    9  
##  Max.   :36455809   Cozy Brooklyn Apartment         :    8  
##                     (Other)                         :32978  
##     host_id            host_name        neighbourhood_group
##  Min.   :     2571   Michael:  267   Bronx        :  826   
##  1st Qu.:  7248357   David  :  256   Brooklyn     :14526   
##  Median : 29464755   John   :  209   Manhattan    :13194   
##  Mean   : 64562362   Alex   :  184   Queens       : 4192   
##  3rd Qu.:101978485   Sarah  :  170   Staten Island:  299   
##  Max.   :273841667   Maria  :  155                         
##                      (Other):31796                         
##             neighbourhood      latitude       longitude     
##  Bedford-Stuyvesant: 2806   Min.   :40.51   Min.   :-74.24  
##  Williamsburg      : 2776   1st Qu.:40.69   1st Qu.:-73.98  
##  Harlem            : 1950   Median :40.72   Median :-73.95  
##  Bushwick          : 1767   Mean   :40.73   Mean   :-73.95  
##  East Village      : 1287   3rd Qu.:40.76   3rd Qu.:-73.93  
##  Hell's Kitchen    : 1191   Max.   :40.91   Max.   :-73.71  
##  (Other)           :21260                                   
##            room_type         price     minimum_nights   number_of_reviews
##  Entire home/apt:16086   Min.   :  0   Min.   : 1.000   Min.   :  1.00   
##  Private room   :16191   1st Qu.: 65   1st Qu.: 1.000   1st Qu.:  3.00   
##  Shared room    :  760   Median :100   Median : 2.000   Median : 11.00   
##                          Mean   :118   Mean   : 2.661   Mean   : 31.73   
##                          3rd Qu.:150   3rd Qu.: 3.000   3rd Qu.: 37.00   
##                          Max.   :334   Max.   :11.000   Max.   :629.00   
##                                                                          
##   last_review    reviews_per_month calculated_host_listings_count
##  6/23/19: 1311   Min.   : 0.010    Min.   :  1.000               
##  7/1/19 : 1242   1st Qu.: 0.210    1st Qu.:  1.000               
##  6/30/19: 1222   Median : 0.850    Median :  1.000               
##  6/24/19:  801   Mean   : 1.482    Mean   :  3.343               
##  7/7/19 :  674   3rd Qu.: 2.230    3rd Qu.:  2.000               
##  7/2/19 :  614   Max.   :58.500    Max.   :327.000               
##  (Other):27173                                                   
##  availability_365
##  Min.   :  0.0   
##  1st Qu.:  0.0   
##  Median : 38.0   
##  Mean   :103.7   
##  3rd Qu.:190.0   
##  Max.   :365.0   
## 

4.1 Linear Regression

Smart Question: How do minimum night, number of reviews, reviews per month, calculated host listings count and availability predict price?

## 
## Call:
## lm(formula = price ~ minimum_nights + number_of_reviews + reviews_per_month + 
##     calculated_host_listings_count + availability_365, data = nyc4)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -124.86  -50.88  -16.84   35.82  227.07 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    109.646231   0.839378 130.628  < 2e-16 ***
## minimum_nights                   2.810407   0.210163  13.373  < 2e-16 ***
## number_of_reviews                0.005585   0.008675   0.644     0.52    
## reviews_per_month               -1.155127   0.256644  -4.501 6.79e-06 ***
## calculated_host_listings_count   0.304686   0.016975  17.949  < 2e-16 ***
## availability_365                 0.013871   0.003042   4.560 5.14e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 65.71 on 33031 degrees of freedom
## Multiple R-squared:  0.01742,    Adjusted R-squared:  0.01727 
## F-statistic: 117.1 on 5 and 33031 DF,  p-value: < 2.2e-16
##                 minimum_nights              number_of_reviews 
##                       1.076810                       1.463123 
##              reviews_per_month calculated_host_listings_count 
##                       1.531107                       1.022583 
##               availability_365 
##                       1.108524

All variables except number_of_reviews are statistically significant at the one percent level. The intercept suggests that 107 dollars is the mean price of an Air B&B. The coefficent on minimum_nights suggests that a one unit increase in the minimum number of nights increases the price by 3.58 dollars. The coefficient on reviews_per_month suggests that a one unit increase in reviews per month decreases price by 1.09 dollars. The coefficient on calculated_host_listings_count suggests that a one unit increase in the count of calculated host listings increases price by .31 cents. The coefficient on availability_365 suggests that a one unit increase in availability throughout the year increases price by .01 cents. The VIF test suggests that multicollinearity is not a concern in this linear model.

Chapter 5: Neighborhood Groups, Room Types and Price

5.1 SMART Questions

Are the Airbnb prices the same across New York city neighbourhood groups and among different room types?

5.2 Are prices the same across NYC Neighbourhood Groups?

H0: The prices are the same across NYC Neighbourhood Groups. H1: There are significant differences in prices across NYC Neighbourhood Groups. We use ANOVA to test this hypothesis. Below are the summary of the results.

outlierKD <- function(dt, var, rmv=NULL) { 
     var_name <- eval(substitute(var),eval(dt))
     na1 <- sum(is.na(var_name))
     m1 <- mean(var_name, na.rm = T)
     sd1 <- sd(var_name,na.rm = T)
     par(mfrow=c(2, 2), oma=c(0,0,3,0))
     boxplot(var_name, main="With outliers")
     hist(var_name, main="With outliers", xlab=NA, ylab=NA)
     outlier <- boxplot.stats(var_name)$out
     mo <- mean(outlier)
     var_name <- ifelse(var_name %in% outlier, NA, var_name)
     boxplot(var_name, main="Without outliers")
     hist(var_name, main="Without outliers", xlab=NA, ylab=NA)
     title("Outlier Check", outer=TRUE)
     na2 <- sum(is.na(var_name))
     cat("Outliers identified:", na2 - na1, "n")
     cat("Propotion (%) of outliers:", round((na2 - na1) / sum(!is.na(var_name))*100, 1), "n")
     cat("Mean of the outliers:", round(mo, 2), "n")
     m2 <- mean(var_name, na.rm = T)
     cat("Mean without removing outliers:", round(m1, 2), "n")
     cat("Mean if we remove outliers:", round(m2, 2), "n")
     #
     if(is.null(rmv)) { 
       response <- readline(prompt="Do you want to remove outliers and to replace with NA? [yes/no]: ") 
     } else {
       if (rmv=='y'|rmv=='yes'|rmv=='Y'|rmv=='Yes'|rmv=='YES'|rmv==TRUE ) { response = 'y' } else { response = 'n' }
     }
     #
     if(response == "y" | response == "yes"){
          dt[as.character(substitute(var))] <- invisible(var_name)
          assign(as.character(as.list(match.call())$dt), dt, envir = .GlobalEnv)
          cat("Outliers successfully removed", "n")
          return(invisible(dt))
     } else{
          cat("Nothing changed", "n")
          return(invisible(var_name))
     }
}
outlierKD(d, price,'y')

## Outliers identified: 2972 nPropotion (%) of outliers: 6.5 nMean of the outliers: 658.78 nMean without removing outliers: 152.72 nMean if we remove outliers: 119.97 nOutliers successfully removed n
summary(d)
##     neighbourhood_group           room_type         price     
##  Bronx        : 1091    Entire home/apt:25409   Min.   :  0   
##  Brooklyn     :20104    Private room   :22326   1st Qu.: 65   
##  Manhattan    :21661    Shared room    : 1160   Median :100   
##  Queens       : 5666                            Mean   :120   
##  Staten Island:  373                            3rd Qu.:159   
##                                                 Max.   :334   
##                                                 NA's   :2972
d1 <- na.omit(d)
summary(d1)
##     neighbourhood_group           room_type         price    
##  Bronx        : 1070    Entire home/apt:22789   Min.   :  0  
##  Brooklyn     :19415    Private room   :21996   1st Qu.: 65  
##  Manhattan    :19506    Shared room    : 1138   Median :100  
##  Queens       : 5567                            Mean   :120  
##  Staten Island:  365                            3rd Qu.:159  
##                                                 Max.   :334
##                        Df    Sum Sq Mean Sq F value Pr(>F)    
## neighbourhood_group     4  24781929 6195482    1509 <2e-16 ***
## Residuals           45918 188500166    4105                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Loading required package: ggplot2
## Loading required package: magrittr

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = price ~ neighbourhood_group, data = d1)
## 
## $neighbourhood_group
##                                diff        lwr        upr     p adj
## Brooklyn-Bronx           28.3341931  22.845989  33.822397 0.0000000
## Manhattan-Bronx          68.5874145  63.099879  74.074950 0.0000000
## Queens-Bronx             11.5390163   5.705153  17.372880 0.0000007
## Staten Island-Bronx      11.8701959   1.276184  22.464208 0.0190278
## Manhattan-Brooklyn       40.2532213  38.481432  42.025010 0.0000000
## Queens-Brooklyn         -16.7951768 -19.452272 -14.138082 0.0000000
## Staten Island-Brooklyn  -16.4639973 -25.697593  -7.230402 0.0000114
## Queens-Manhattan        -57.0483982 -59.704112 -54.392684 0.0000000
## Staten Island-Manhattan -56.7172186 -65.950417 -47.484021 0.0000000
## Staten Island-Queens      0.3311796  -9.111959   9.774318 0.9999811

5.3 Are prices the same across Airbnd Room Types? H0: The prices are the same across Airbnd Room Types. H1: There are significant differences in prices across Airbnd Room Types. We use ANOVA to test this hypothesis. Below are the summary of the results.

summary(d$room_type)
## Entire home/apt    Private room     Shared room 
##           25409           22326            1160
##                Df    Sum Sq  Mean Sq F value Pr(>F)    
## d$room_type     2  82350853 41175426   14441 <2e-16 ***
## Residuals   45920 130931242     2851                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2972 observations deleted due to missingness

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = d$price ~ d$room_type, data = d1)
## 
## $`d$room_type`
##                                    diff        lwr       upr p adj
## Private room-Entire home/apt  -83.50859  -84.69151 -82.32568     0
## Shared room-Entire home/apt  -103.23360 -107.03491 -99.43229     0
## Shared room-Private room      -19.72501  -23.52957 -15.92044     0

Chapter 6: Neighborhood Groups, Room Types and Availability

6.1 Neighborhood Groups and Availability

Smart Question: What is the least available neighborhood group?

Here, we are trying to anwer the question “What is the least available neighborhood group? and how can we rank their availabilities?”. If we rank the Neighborhood groups based on availability, the most popular Neighborhood Group is Brooklyn, followed by Manhattan, Queens, Bronx and Staten Island respectively.

##         Bronx      Brooklyn     Manhattan        Queens Staten Island 
##          1091         20104         21661          5666           373

If we rank the Neighborhood groups based on availability, the least available Neighborhood Group is Brooklyn, followed by Manhattan, Queens, Bronx and Staten Island respectively.

Brooklyn
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0    28.0   100.2   188.0   365.0

Lets look at the least availabile neighborhood group, Brooklyn. The median for Brooklyn is 28 days, meaning half of the neighborhoods in Brooklyn’s availability is less than 28 days of the 365days. In short, it is available 7.6712329 % of the time.

Manhattan
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0      36     112     230     365

Lets look at the second least availabile neighborhood group, Manhattan The median for Manhattan is 36 days, meaning half of the neighborhoods in Manhattan’s availability is less than 36 days of the 365days. In short, it is available 9.8630137 % of the time.

Queens
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     2.0    98.0   144.5   286.0   365.0

Lets look at the third least availabile neighborhood group, Queens The median for Queens is 98 days, meaning half of the neighborhoods in Queens’s availability is less than 98 days of the 365days. In short, it is available 26.8493151 % of the time.

Bronx
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    37.0   148.0   165.8   313.5   365.0

Lets look at the fourth least availabile neighborhood group, Bronx The median for Bronx is 148 days, meaning half of the neighborhoods in Bronx’s availability is less than 148 days of the 365days. In short, it is available 40.5479452 % of the time.

Staten Island
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    78.0   219.0   199.7   333.0   365.0

Lets look at the most availabile neighborhood group, Staten Island The median for Staten Island is 219 days, meaning half of the neighborhoods in Staten Island’s availability is less than 219 days of the 365days. In short, it is available 60 % of the time.

6.1.1 ANOVA and Tukey for Neighborhood Group and Availability

Smart Question: Are the differences between availabilities significant?

Based on the information in their availabilities, are the difference significant enough accross the neighborhood groups?

##                        Df    Sum Sq Mean Sq F value Pr(>F)    
## neighbourhood_group     4  14741568 3685392   216.5 <2e-16 ***
## Residuals           48890 832318962   17024                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = availability_365 ~ neighbourhood_group, data = Airbnbdata)
## 
## $neighbourhood_group
##                              diff        lwr        upr     p adj
## Brooklyn-Bronx          -65.52664 -76.590498 -54.462791 0.0000000
## Manhattan-Bronx         -53.77953 -64.822893 -42.736160 0.0000000
## Queens-Bronx            -21.30712 -33.074224  -9.540014 0.0000078
## Staten Island-Bronx      33.91935  12.571845  55.266850 0.0001425
## Manhattan-Brooklyn       11.74712   8.261586  15.232650 0.0000000
## Queens-Brooklyn          44.21953  38.866233  49.572819 0.0000000
## Staten Island-Brooklyn   99.44599  80.847367 118.044617 0.0000000
## Queens-Manhattan         32.47241  27.161586  37.783230 0.0000000
## Staten Island-Manhattan  87.69887  69.112429 106.285319 0.0000000
## Staten Island-Queens     55.22647  36.201095  74.251837 0.0000000

We can see the variation of the means and medians of availabilities of each neighborhood group and we perforemed ANNOVA testing to see if there is a true difference between these values. We found out that the p-value is less than 0.05 and there fore we reject the null hypothesis. This means there is a significant difference between the neighborhood groups in terms of their availability.

We also performend a tukey test, and we can see that the p-values are less than 0.5 which means their variations are significant.

6.2 Room Types and Availability

Smart Question: What is the most available type of room?

## Entire home/apt    Private room     Shared room 
##           25409           22326            1160

We see the number of data collected for each room type. Based on room type, the most available is a shared room. Private rooms and Entire Homes/Apartments are almost equally available. We will also determine if the difference in means is significant.

Entire Home Availability
##  availability_365
##  Min.   :  0.0   
##  1st Qu.:  0.0   
##  Median : 42.0   
##  Mean   :111.9   
##  3rd Qu.:229.0   
##  Max.   :365.0
Private Room Availability
##  availability_365
##  Min.   :  0.0   
##  1st Qu.:  0.0   
##  Median : 45.0   
##  Mean   :111.2   
##  3rd Qu.:214.0   
##  Max.   :365.0
Shared Home Availability
##  availability_365
##  Min.   :  0     
##  1st Qu.:  0     
##  Median : 90     
##  Mean   :162     
##  3rd Qu.:341     
##  Max.   :365

6.2.1 ANNOVA and Tukey for Room Type and Availability

Smart Question: Are the differences between availabilities significant?

##                Df    Sum Sq Mean Sq F value Pr(>F)    
## room_type       2   2884561 1442280   83.53 <2e-16 ***
## Residuals   48892 844175969   17266                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = availability_365 ~ room_type, data = Airbnbdata)
## 
## $room_type
##                                    diff       lwr       upr     p adj
## Private room-Entire home/apt -0.7163712 -3.541374  2.108632 0.8231684
## Shared room-Entire home/apt  50.0805582 40.834331 59.326785 0.0000000
## Shared room-Private room     50.7969294 41.522872 60.070987 0.0000000

We can see the variation of the means and medians of availabilities of each room type and we perforemed ANNOVA testing to see if there is a true difference between these values. We found out that the p-value is less than 0.05 and there fore we reject the null hypothesis. This means there is a significant difference between the room types in terms of their availability.

We also performend a tukey test, and we can see that the p-value for private room and entire home is more than 0.5. Their variations are insignificant and there is no true difference between their means.